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Abstract With graphical Markov models, one can investigate complex depen- 
dences, summarize some results of statistical analyses with graphs and use these 
graphs to understand implications of well-fitting models. The models have a rich 
history and form an area that has been intensively studied and developed in recent 
years. We give a brief review of the main concepts and describe in more detail a flex- 
ible subclass of models, called traceable regressions. These are sequences of joint 
response regressions for which regression graphs permit one to trace and thereby un- 
derstand pathways of dependence. We use these methods to reanalyze and interpret 
data from a prospective study of child development, now known as the 'Mannheim 
Study of Children at Risk' . The two related primary features concern cognitive and 
motor development, at the age of 4.5 and 8 years of a child. Deficits in these features 
form a sequence of joint responses. Several possible risks are assessed at birth of the 
child and when the child reached age 3 months and 2 years. 



1 Introduction 

To observe and understand relations among several features of individuals or ob- 
jects is one of the central tasks in many substantive fields of research, including the 
medical, social, environmental and technological sciences. Statistical models can 
help considerably with such tasks provided they are both flexible enough to apply 
to a wide variety of different types of situation and precise enough to guide us in 
thinking about possible alternative relationships. This requires in particular joint re- 
sponses, which contain continuous random variables, discrete random variables or 
both types, in addition to only single responses. 
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Causal inquiries, the search for causes and their likely consequences, motivate 
much empirical research. They rely on appropriate representations of relevant path- 
ways of dependence as they develop over time, often called data generating pro- 
cesses. Causes which start pathways with adverse consequences may be called risk 
factors or risks. Knowing relevant pathways offers in principle the opportunity to in- 
tervene, aiming to stop the accumulation of some of the risks, and thereby to prevent 
or at least alleviate their negative consequences. 

Properties of persons or objects and features, such as attitudes or behavior of in- 
dividuals, which can vary for the units or individuals under study, form the variables 
that are represented in statistical models. A relationship is called a strong positive 
dependence if knowing one feature makes it much more likely that the other feature 
is present as well. If, however, prediction of a feature cannot be improved by know- 
ing the other, then the relation of the two is called an independence. Whenever such 
relations only hold under certain conditions, then they are qualified to be conditional 
dependences or independences. 

Graphs, with nodes representing variables and edges indicating dependences, 
serve several purposes. These include to incorporate available knowledge at the 
planning stage of an empirical study, to summarize aspects important for interpre- 
tation after detailed statistical analyses and to predict, when possible, effects of in- 
terventions, of alternative analyses of a given set of data or of changes compared to 
results from other studies with an identical core set of variables. 

Corresponding statistical models are called graphical Markov models. Their 
graphs are simple when they have at most one edge for any variable pair even 
though there may be different types of edge. The graphs can represent different 
aspects of pathways, such as the conditional independence structure, the set of all 
independence statements implied by a graph, or they indicate which variables are 
needed to generate joint distributions. In the latter case, the graph represents a re- 
search hypothesis on variables that make an important contribution. Theoretical and 
computational work has progressed strongly during the last few years. 

In the following, we give first some preliminary considerations. Then we describe 
some of the history of graphical Markov models and the main features of their most 
flexible subclass, called traceable regressions. We illustrate some of the insights 
to be gained with sequences of joint regressions, that turn out to be traceable in 
a prospective study of child development, now known as the Mannheim Study of 
Children at Risk. 



2 Several preliminary considerations 

Graphical Markov models are of interest in different contexts. In the present paper, 
we stress data analysis and interpretation. From this perspective, a number of con- 
siderations arise. In a given study, we have objects or individuals, here children, and 
their appropriate selection into the study is important. Each individual has properties 
or features, represented as variables in statistical models. 
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A first important consideration is that for any two variables, either one is a pos- 
sible outcome to the other, regarded as possibly explanatory, or the two variables 
are to be treated as of equal standing. Usually, an outcome or response refers to a 
later time period than a possibly explanatory feature. In contrast, an equal standing 
of two or more features is appropriate when they refer to the same time period or all 
of them are likely to be simultaneously affected by an intervention. 

On the basis of this, we typically organize the variables for planned statistical 
analyses into a series of blocks, often corresponding to a time ordering. All rela- 
tions between variables within a same block are undirected, whereas those between 
variables in different blocks are directed in the way described. 

An edge between two nodes in the graph, representing a statistical dependence 
between two variables, may thus be of at least two types. To represent a statistical 
dependence of an outcome on an explanatory feature, we use a directed edge with an 
arrow pointing to the outcome from the explanatory feature. For relations between 
features of equal standing, we use undirected edges. 

In fact, it turns out to be useful to have two types of undirected edge. A dashed 
line is used to represent the dependence between two outcomes or responses given 
variables in their past. By contrast, a full line in the block of variables describing the 
background or context of the study and early features of the individuals under study, 
represents a conditional dependence given all remaining background variables. 

From one viewpoint, the role of the graphical representation is to specify statisti- 
cal independences that can be used to simplify understanding. From a complemen- 
tary perspective, often the more immediately valuable, the purpose is to show those 
strong dependences that will be the base for interpreting pathways of dependence. 



3 Some history of graphical Markov models 

The development of graphical Markov models started with undirected, full line 
graphs; see Wermuth (1976), Darroch, Lauritzen and Speed (1980). The results 
built, for discrete random variables, on the log-linear models studied by Birch 
(1963), Goodman (1970), Bishop, Fienberg and Holland (1975), and for Gaussian 
variables, on the covariance selection models by Dempster ( 1972). Shortly later, the 
models were extended to acyclic directed graph models for Gaussian and for dis- 
crete random variables; see Wermuth (1980), Wermuth and Lauritzen (1983). With 
the new model classes, results from the beginning of the 20th century by geneticist 
Sewall Wright and by probabilist Andrej Markov were combined and extended. 

These generalizations differ from those achieved with structural equations that 
were studied intensively in the 1950's within econometrics; see for instance Bollen 
(1989). Structural equation models extend sequences of linear, multiple regression 
equations by permitting explicitly endogenous responses. These have residuals that 
are correlated with some or all of the regressors. For such endogenous responses, 
equation parameters need not measure conditional dependences, missing edges in 
graphs of structural equations need not correspond to any independence statement 
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and no simple local modelling may be feasible. This contrasts with traceable regres- 
sions; see Section 4.1. 

Wright had used directed acyclic graphs, that is graphs with only directed edges 
and no variables of equal standing, to represent linear generating processes. He de- 
veloped 'path analysis' to judge whether such processes were well compatible with 
his data. Path analyses were recognized by Tukey (1954) to be fully ordered, also 
called 'recursive' , sequences of linear multiple regressions in standardized variables. 

With his approach, Wright was far ahead of his time, since, for example, formal 
statistical tests of goodness of fit were developed much later; see Wilks (1938). 
Conditions under which directed acyclic graphs represent independence structures 
for almost arbitrary types of random variables were studied later still; see Pearl 
(1988), Studeny (2005). 

One main objective of traceable regressions is to uncover graphical represen- 
tations that lead to an understanding of data generating processes. These are not 
restricted to linear relations although they may include linear processes as special 
cases. A probabilistic data generating process is a recursive sequence of conditional 
distributions in which response variables can be vector variables that may contain 
discrete or continuous components or both types. Each of the conditional distribu- 
tions specifies both the dependences of a joint response, Y a say, on components in an 
explanatory variable vector, T/„ and the undirected dependences among individual 
response component pairs of Y a . 

Graphical Markov models generalize sequences of single responses and single 
explanatory variables that have been extensively studied as Markov chains. Markov 
had recognized at the beginning of the 20th century that seemingly complex joint 
probability distributions may be radically simplified by using the notion of condi- 
tional independence. 

In a Markov chain of random variables Y\ , . , , , Yd, the joint distribution is built up 
by starting with the marginal density of Y ( [ and generating then the conditional 
density fd-\\d- At the next step, conditional independence of Yd-2 from Y c / given 
Yd-\ is taken into account, with fd-2\d-i,d = fd-2\d-i- One continues such that 
with fi\i+i r „d = fi\i+i> response Y, is conditionally independent of • • • ,Yd given 
Yf+\, written compactly in terms of nodes as iAL{i + 2, . . . ,d}\{i + 1}, and ends, 
finally, with f\ ^ d = /l|2> where Y\ has just Y2 as an important, directly explanatory 
variable. 

The fully directed graph, that captures such a Markov chain, is a single directed 
path of arrows. For five nodes, d = 5, and node set A = {1,2,3,4,5}, the graph is 

1H — 2< — 3H — 4h — 5. 

This graph corresponds to a factorization of the joint density fy given by 

In = fi\2fi\ih^h\5fs- 

The three defining local independence statements given directly by the above fac- 
torization or by the graph are: 1_LL{3,4,5}|2, 2J1{4,5}|3 and 3X5 14. One also says 
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that in such a generating process, each response Yj 'remembers of its past just the 
nearest neighbour', the nearest past variable Yj + \. 

Directed acyclic graphs are the most direct generalization of Markov chains. 
They have a fully ordered sequence of single nodes, representing individual re- 
sponse variables for which conditional densities given their past generate fy. No 
pairs of variables are on an equal standing. In contrast to a simple Markov chain, 
in this more general setting, each response may 'remember any subset or all of the 
variables in its past' . 

Directed acyclic graphs are also used for Bayesian networks where the node set 
may not only consist of random variables, that correspond to features of observable 
units, but can represent decisions or parameters. As a framework for understanding 
possible causes and risk factors, directed acyclic are too limited since they exclude 
the possibility of an intervention affecting several responses simultaneously. 

One early objective of graphical Markov models was to capture independence 
structures by appropriate graphs. As mentioned before, an independence structure 
is the set of all independence statements implied by the given graph. Such a structure 
is to be satisfied by any family of densities, fy, said to be generated over a given 
graph. 

In principle, all independence statements that arise from a given set of defining 
statements of a graph, may be derived from basic laws of probability by using the 
standard properties satisfied by any probability distribution and possibly some ad- 
ditional ones, as described for regression graphs in Section 4. 1 ; see also Frydenberg 
(1990) for a discussion of properties needed to combine independence statements 
captured by directed acyclic graphs. 

The above Markov chain implies for instance also 

UL4|3, {1,2}_1L{4,5}|3, and 2_LL4|{1,3,5}. 

For many variables, methods defined for graphs simplify considerably the task of 
deciding for a given independence statement whether it is implied by a graphs. Such 
methods have been called separation criteria; see Geiger, Verma and Pearl (1990), 
Lauritzen et al. (1990) and Marchetti and Wermuth (2009) for different but equiva- 
lent separation criteria for directed acyclic graphs. 

For ordered sequences of vector variables, permitting joint instead of only single 
responses, the graphs are directed acyclic in blocks of vector variables. These blocks 
are sometimes called the 'chain elements' of the corresponding 'chain graphs'. Four 
different types of such graphs for discrete variables have been classified and studied 
by Drton (2009). He proves that two types of chain graph have the desirable property 
of defining always curved exponential families for discrete distributions; see for 
instance Cox (2006) for the latter concept. 

This property holds for the 'LWF-chain graphs' of Lauritzen and Wermuth 
(1989) and Frydenberg (1990), and for the graphs of Cox and Wermuth (1993, 1996) 
that have more recently been slightly extended and studied as 'regression graphs'; 
see Wermuth and Sadeghi (2012), Sadeghi and Marchetti (2012). With the added 
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feature that each edge in the graph corresponds to dependence that is substantial in 
a given context, they become 'traceable regressions'; see Wermuth (2012). 

Most books by statisticians on graphical Markov models focus on undirected 
graphs and on LWF-chain graphs; see Hojsgaard, Edwards and Lauritzen (2012), 
Edwards (2000), Lauritzen (1996), Whittaker (1990). In this class of graphical 
Markov models, each dependence between a response and a variable in its past 
is considered to be conditional also on all other components within the same joint 
response. 

Main distinguishing features between different types of chain graph are the con- 
ditioning sets for the independences, associated with the missing edges, and for the 
edges present in the graph. For regression graphs, conditioning sets are always ex- 
cluding other components of a given response vector, and criteria, to read off the 
graph all implied independences, do not change when the last chain element con- 
tains an undirected, full-line graph. It is in this general form, in which we introduce 
this class of models here. The separation criteria for these models are generalized 
versions of the criteria that apply to directed acyclic graphs. 

Figure Q] shows two sets of joint responses and a set of background variables, 
ordered by time. The two related joint responses concern aspects of cognitive and 
motor development at age 8 years (abbreviated by Yg,X%, respectively) and at age 
4.5 years (745X4). There are two risks, measured up to 2 years, Y,,X r , where Y, is 
regarded as a main risk for cognitive development and X r as a main risk for motor 
development. Two more potential risks are available already at age 3 months of 
the child. Detailed definitions of the variables, a description of the study design 
and of further statistical results are given in Laucht, Esser and Schmidt (1997) and 
summarized in Wermuth and Laucht (2012). 



8 years 



Cognitive 
deficits, Y8 

Motoric 
deficits, X8 



41/2 years 



Cognitive 
deficits, Y4 

Motoric 
deficits, X4 



up to 2 years 



Unprotective environment at 3 months, E 

Psycho-social risk, Yr 
Biological-motoric risk, Xr 

Hospitalization up to 3 months, H 



Fig. 1 Ordering of the variables given by time; the joint responses of primary interest are Yg,Xg, 
those of secondary interest are Y4.X4, the four context variables are risks known up to age 2. 



4 Sequences of regressions and their regression graphs 



The well-fitting regression graph in Figure |2]is for the variables of Figure Q] and for 
data of 347 families participating in the Mannheim study from birth of their first 
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child until the child reached the age of 8 years. The graph results from the statistical 
analyses reported in Section 4.2. These are further discussed in Section 4.3. 



Y8, Cognitive 

deficits, 8yrs T 
i 
i 
i 

X8, Motoric i. 
deficits, 8yrs 



Fig. 2 A well-fitting regression graph for data of the child development study; arrows pointing 
from regressors in the past to a response in the future; dashed lines for dependent responses given 
their past; full lines for dependent early risk factors given the remaining background variables. 

The goodness-of-fit of the graph to the given data is assessed by local modeling 
which include here linear and nonlinear dependences. The following Table 1 gives 
a summary in terms of Wilkinson's model notation that is in common use for gen- 
eralized linear models and two coefficients of determination, R 2 . There is a good fit 
for quantitative responses when the changes from R 2 ull to /?? e] are small, that is from 
the regression of an individual response on all variables in its past to a regression on 
only a reduced set of selected regressors. 



Table 1 Fitted equations in Wilkinson's notation 



Response 


Selected model 


p2 
"full 


R sel 


Yh : 


Y A +X}+E + H 


0.67 


0.67 


X g : 


x}+x r 


0.36 


0.36 


U: 


Y r +X} 


0.25 


0.25 


Xj : 


Yr+Xj 


0.37 


0.36 


Y, : 


E 2 


0.57 


0.56 


X r ; 


E + H 


0.35 


0.35 



Note that any square term implies that also 
a main effect is included 



4.1 Explanations and definitions 

In each regression graph, arrows point from the past to the future. An arrow is 
present, between a response and a variable in its past, when there is a substantively 
important dependence, that is also statistically significant, given all its remaining 
regressors. Regressors are recognized in the graph by arrows pointing to a given 
response node. 




E, Unprotective 
environment 



H, Hospitalized 
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The undirected dependence between two individual components of a response 
vector is indicated here by a dashed line; some authors draw instead a bi-directed 
edge. Such an edge is present if there is a substantial dependence between two re- 
sponse components given the past of the considered joint response. An undirected 
edge between two context variables is a full line. Such an edge is present when 
there is a substantial dependence given the remaining context variables. An edge is 
missing, when for this variable pair no dependence can be detected, of the type just 
decribed. 

The important elements of this representation are node pairs i,k, possibly con- 
nected by an edge, and a full set ordering g\ < gi < ■ ■ ■ < gj for the connected 
components gj of a regression graph. The connected components of the graph are 
uniquely obtained by deleting all arrows from the graph and keeping all nodes and 
all undirected edges. In general, several orderings may be compatible with a given 
graph since different generating processes may lead to a same independence struc- 
ture. 

There is further an ordered partitioning of the node set into two parts, that is a 
split ofNasN— (u,v), such that response node sets g\ , . . . are in u and background 
node sets . . . ,gj are in v. In Figured there are two sets in u: gi = {Y&,X$} and g2 = 
{T^X}}. The subgraph of the background variables is for v = gj = {Y r ,X r ,E ,H} 
and there is only one compatible ordering of the three sets gj. 

Within v, the undirected graph is commonly called a concentration graph, re- 
minding us of the parameterization for a Gaussian distribution, where a concentra- 
tion, an element in the inverse covariance matrix, is a multiple of the partial corre- 
lation given all remaining variables; see Cox and Wermuth (1996), Section 3.4, or 
Wermuth(1976). 

Within u, the undirected graph induced by the set gj is instead a conditional 
covariance graph given the past of gj, the nodes in g >; = {gj+i, ■ ■ ■ ,gj}', see Wer- 
muth, Cox and Marchetti (2009), Wiedenbeck and Wermuth (2010) for related es- 
timation tasks. Arrows may point from any node in gj for j > 1 to its future in 
g<j = {gi, . . . ,gj-i} but never to its past. Thus within each gj, there are only undi- 
rected edges and all arrows point from nodes in gj to nodes in g<j, where g K \ =0. 

With g y j = 0, the basic factorization of a family of densities fy, generated over 
a regression graph, , is 

fN = fu\vfv With f u \ v = UgjCufgj\ g> j and fv = UgjCvfgj , (1) 

and the family satisfies all independence constraints implied by the graph. 

For i,k a node pair, and c C N\{i,k}, we write j_LL£|c for Yi,Y^ conditionally 
independent given Y c . In terms of a joint conditional density f jk \ c , this is equivalent 
to the following constraints on conditional densities: 

iALk\c <=> (f i \ kc = fi\ c ) f ik \c = {fi\cfk\c)- 

For every variable pair I^,!* making an important contribution to the generating 
process of fy, we say it is conditionally dependent given Y c for some c cN\ {i,k} 
specified in Definition 1 below and write i iti k\c. A regression graph is said to be 
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edge-minimal if every missing edge in the graph corresponds to a conditional inde- 
pendence statement and every edge present is taken to represent a dependence; see 
the following definition. 

Definition 1. Defining pairwise dependences of G„ g . An edge-minimal regression 
graph specifies with g\ < • • • < gj a generating process for fy where the following 
dependences 

/ k : i rtl k\g > j for i, k response nodes in gj of u, 

z'H — k: i rtl k\g > j \ {k} for response node ; in gj of u and node k in g > j, (2) 
i k: i &\ k\v\{i,k} for i, k context nodes in v, 

define the edges present in G„ g . The meaning of each corresponding edge missing 
in G^ g results with the dependence sign rtl replaced by the independence sign _LL. 

By equation ||2), a unique independence statement is assigned to the missing edge 
of each uncoupled node pair k. To combine independence statements implied by 
a regression graph, two properties are needed, called composition and intersection; 
see Sadeghi and Lauritzen (2012). The properties are stated below in Definition 3(1) 
as a same joint independence implied by the two independence statements under 
bullet points 2 and 3 on the right-hand side. In their simplest form, the two properties 
can be illustrated with two simple 3-node graphs. 

For all trivariate probability distributions, one knows iALhk => (iALh and iALk) 
as well as iALhk =>■ (iALh\k and iALk\h). The reverse implications are the compo- 
sition and the intersection property, respectively. Thus, whenever node i is isolated 
from the coupled nodes h, k in a 3-node regression graph, it is to be interpreted as 
iALhk and this type of subgraph in three nodes i,h, k results, under composition, by 
removing the z'/j-arrow and the ;fc-arrow in the following graph on the left and under 
intersection in the following graph on the right. These small examples show already 
that the two properties are used implicitly in the selection of regressors; the compo- 
sition property for multivariate regressions and the intersection property for directed 
acyclic graph models. 



For the tracing of dependences, we need both of these properties but also the 
following, called singleton transitivity. It is best explained in terms of the Vs of a 
regression graph, the subgraphs in 3 nodes having 2 edges. In a regression graph, 
there can be at most 8 different V-configurations. Such a V in three nodes, (i,o,k) 
say, has uncoupled endpoints k and inner node o. 

The V configurations in G^ g are of two different types. In G^ g , the collision Vs 
are: 



h 



h 






and the transmitting Vs are: 
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H oh k, zH o k, i o k, /H o k, — o — )^k. 

These generalize the 3 different possible Vs in a directed-acyclic graph. For such an 
edge-minimal graph, the two uncoupled nodes A: of a transmitting V have either an 
important common-source node (as above on the right) or an important intermediate 
node (as above on the left), while the two uncoupled nodes i,k of a collision V with 
two arrows pointing to its inner node, have an important, common response, a 

Singleton transitivity means that a unique independence statements is assigned 
to the endpoints k of each V of an edge-minimal graph, either the inner node o 
is included or excluded in every independence statement implied by the graph for 
i, k. For the strange parametrisation under which singleton transitivity is violated in 
a trivariate discrete family of distributions; see Wermuth (2012). 

Expressed equivalently, let node pair i,k be uncoupled in an edge-minimal 
G^ g and consider a further node o and a set c C N \ {/, o, k}. Under singleton transi- 
tivity, for both the independences /_LL£|c and /_LLA:|oc to hold, one of the constraints 
o_LL;|c or o_LLfc|c has to be satisfied as well. Without singleton transitivity, the path 
of a V in nodes (i, o k) can never induce a dependence for the endpoints i, k. 

Definition 2. Dependence-base regression graph. An edge-minimal G^ g , is said to 
form a dependence base when its defining independences and dependences are com- 
bined by using standard properties of all probability distributions and the three ad- 
ditional properties: intersection, composition and singleton transitivity. 

A dependence base regression graph, G^ g , is edge-inducing by marginalizing 
over the inner node of a transmitting V and by conditioning on the inner node of a 
collision V. This can be expressed more precisely. 

Theorem 1. Implications of Vs in a dependence-base regression graph (Wermuth, 
2012). For each V in three nodes, (i,o,k) of a dependence-base G^ g , there exists 
some c C N\ {i,o, k}, such that the graph implies (iALk\oc and i iti k\c) when it is a 
transmitting V, while it implies (iALk\c and i ftl k\oc) when it is a collision V. 

The requirement appears to be elementary, but some densities or families of den- 
sities fy, even when generated over a dependence base G^ g , may have such peculiar 
parameterizations that both statements iALk\oc and /_LL£|c can hold even though both 
node pairs ;,o and ok are coupled by an edge. Thus, singleton-transitivity needs to 
be explicitly carried over to a generated density. 

We sum up as follows. For a successful tracing of pathways of dependence in an 
edge-minimal regression graph, all three properties are needed: composition, inter- 
section and singleton transitivity. Intersection holds in all positive distributions and 
the composition property holds whenever nonlinear and interactive effects also have 
non-vanishing linear dependences or main effects. 

Singleton transitivity is satisfied in binary distributions; see Simpson (1951). 
More generally, it holds when families of densities are generated over G^ g that have 
a rich enough parametrization, such as the conditional Gaussian distributions of 
Lauritzen and Wermuth (1989) that contain discrete and continuous responses. 
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Definition 3. Characterizing properties of traceable regressions. Traceable regres- 
sion are densities fy generated over a dependence base G^ g , that have for disjoint 
subsets a,b,c 7 d of N 

(1) three equivalent decompositions of the same joint independence 



(2) edge-inducing V's of are dependence-inducing for fy. 

One outstanding feature of traceable regressions is that many of their conse- 
quences can be derived by just using the graph, for instance when one is marginal- 
izing over some variables in set M, and conditioning on other variables in set C. 
In particular, graphs can be obtained for node sets N' = N \ {C,M} which capture 
precisely the independence structure implied by G„ g , the generating graph in the 
larger node set N, for fyi\r, the family of densities of given Yq. 

Such graphs are named independence-preserving, when they can be used to de- 
rive the independence structure that would have resulted from the generating graph 
by conditioning on a larger node set {C,c} and marginalizing over the set {M,m}. 
Otherwise, such graphs are said to be only independence-predicting. Both types of 
graph transformations can be based on operators for binary matrices that represent 
graphs; see Wermuth, Wiedenbeck and Cox (2006), Wermuth and Cox (2004). 

From a given generating graph, three corresponding types of independence- 
preserving graph result by using the same sets C,M. These are in a subclass of the 
much larger class of MC-graphs of Koster (2002), studied as the ribbon-less graphs 
by Sadeghi (2012a), or they are the maximal ancestral graphs of Richardson and 
Spirtes (2002) or the summary graphs of Wermuth (2011); see Sadeghi (2012a) for 
proofs of their Markov equivalence. 

A summary graph shows when a generating conditional dependence, of Y,- on 
Yk say, in fy remains undistorted in fw\c, parametrized in terms of conditional de- 
pendences, and when it be may become severely distorted; see Wermuth and Cox 
(2008). Some of such distortions can occur in randomized intervention studies, but 
they may often be avoided by changing the set M or the set C. 

Therefore, these induced graphs are relevant for the planning stage of follow-up 
studies, designed to replicate some of the results of a given large study by using a 
subset of the variables, that is after marginalizing over some variables, and/or by 
studying a subpopulation, that is after conditioning on another set of variables. 

For marginalizing alone, that is in the case of C = 0, one may apply the following 
rules for inserting edges repeatedly, keep only one of several induced edges of the 
same type, and gets often again a regression graph induced by N' = N\M. In gen- 
eral, a summary graph results; see Wermuth (201 1). The five transmitting Vs induce 
edges by marginalizing over the inner node 



— k, H — f k, i ft k, H — ft— k, H — ft — >-k 




to give, respectively, 
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The induced edges 'remember the type of edge at the endpoints of the V when 

one takes into account that each edge o o in G^ g can be generated by a larger 

graph, that contains oh — $> — *-o. Thereby, the independence structure implied 
by this graph, for the node set excluding the hidden nodes, { 0}, is unchanged. 

For any choice of C,M and a given generating graph G„ g , routines in the pack- 
age 'ggm', contained within the computing environment R, help to derive the im- 
plications for f N i\ c by computing either one of the different types of independence- 
preserving graph; see Sadeghi and Marchetti (2012). Other routines in 'ggm' decide 
whether a given independence-preserving graph is Markov equivalent to another 
one or to a graph in one of the subfamilies, such as a concentration or a directed 
acyclic graph; see Sadeghi (2012b) for justifications of these procedures. This helps 
to contemplate and judge possible alternative interpretations of a given G^ g . 

For two regression graphs, the Markov equivalence criterion is especially simple: 
the two graphs have to have identical sets of node pairs with a collision V; see 
Theorem 1 of Wermuth and Sadeghi (2012). The result implies that the two sets 
may contain different ones of the 3 possible collision Vs. Also, the two sets of pairs 
with a transmitting V are then identical, though a given transmitting V in one graph 
may correspond in the other graph to another one of the 5 transmitting Vs that can 
occur in G^L . 



4.2 Constructing the regression graph via statistical analyses 

As mentioned before, we use here data from the Mannheim Study of Children at 
riskt. The study started in 1986 with a random sample of more than 100 newborns 
from the general population of children born in the Rhine-Neckar region in Ger- 
many. This sample was completed to give equal subsamples, in each of the nine 
level combinations of two types of adversity, taken to be at levels 'no, moderate or 
high' . In other words, there was heavy oversampling of children at risk. 

The recruiting of families stopped with about 40 children of each risk level com- 
bination and 362 children in the study. All measurements were reported in standard- 
ized form using the mean and standard deviation of the starting random sample, 
called here the norm group. Of the 362 German-speaking families who entered the 
study when their first, single child was born without malformations or any other 
severe handicap, 347 families participated still when their child reached the age of 
8 years. 

Two types of risks were considered, one relevant for cognitive the other for motor 
development. One main difference to previous analyses is that we averaged three 
assessments of each type of risk: taken at birth, at 3 months and at two years. 
This is justified in both cases by the six observed pairwise correlations being all 
nearly equal. The averaged scores, called 'Psycho-social risk up to 2 years', Y r , and 
'Biological-motoric risk up to 2 years', X r , have smaller variability than the individ- 
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ual components. This points to a more reliable risk assessment and leads to clearly 
recognizable dependences, to the edges present in Figure |2] 

The regression equations may be read off Tables 2 to 7 below. For instance for 
is, there are four regressors and one nonlinear dependence on X4 with 

Ei in (y 8 |past of F 8 )=0.03+0.78F 8 +(0.07 + 0.10X 4 )X4 + 0.11£ + 0.12#. 

The test results of Table 2 imply that the previous measurement of cognitive deficits 
at age 4 years, I4 is the most important regressor and that the next important depen- 
dence is nonlinear and on motoric deficits at 4 years, X\ . 

For each individual response component of the continuous joint responses, the 
results of linear-least squares fittings are summarized in six tables. In each case, the 
response is regressed in the starting model on all the variables in its past. Quadratic 
or interaction terms are included whenever there is a priori knowledge or a system- 
atic screening alerts to them; see Cox and Wermuth (1994). 

The tables give the estimated constant term and for each variable in the regres- 
sion, its estimated coefficient (coeff), the estimated standard deviation of the coeffi- 
cient (Scoeff), as we ll as the ratio Zobs =coeff/5 coe ff, often called a studentized value. 
Each ratio is compared to the 0.995 quantile of a standard Gaussian random vari- 
able Z, for which Pr(Z > |2.58|) = 0.01. This relatively strict criterion for excluding 
variables assures that each edge in the constructed regression graph corresponds to 
a dependence that is considered to be substantively strong in the given context, in 
addition to being statistically significant for the given sample size. 

At each backward selection step, the variable with the smallest observed value 
|z bs I is deleted from the regression equation, one at a time, until the threshold is 
reached so that no more variables can be excluded. The remaining variables are se- 
lected as the regressors of the response. An arrow is added for each of the regressors 
to the graph containing just the nodes, arranged in g\ < gi < ■ ■ ■ < gj- 

The last column in each table shows the studentized value z' ohs , that would be ob- 
tained when the variable were included next into the selected regression equation. 
Wilkinson's model notation is added in the table to write the selected model in com- 
pact form. For continuous responses, the coefficient of determination is recorded 
for the starting model, denoted by R^ all and for the reduced model containing the 
selected regressors, denoted by /?? 

A dashed line is added, for a variable pair of a given joint response, when in the 
regression of one on the other, there is a significant dependence given their combined 
set of the previously selected regressors. 

A full line is added for a variable pair among the background variables, when 
in the regression of one on all the remaining background variables, there is a sig- 
nificant dependence of this pair. This exploits that an undirected edge present in a 
concentration graph, must also be be significant in such a regression; see Wermuth 
(1992). 
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Table 2 Regression results for 7g 



Response: Jg, cognitive deficits at 8 years 




starting model 


selected 




excluded 


explanatory variables 


coeff 


■^'coeff 


Sobs 


coeff 


■^'coeff 


Sobs 


/ 

^obs 


constant 


0.00 






0.03 








Kt, cognitive deficits, 4.5yrs 


0.78 


0.05 


15.36 


0.78 


0.05 


15.70 




Xi, motoric deficits, 4.5yrs 


0.05 


0.04 




0.07 


0.04 






Y r , psycho-social risk, 2yrs 


0.00 


0.07 


0.01 








-0.13 


X r , biol. -motoric risk, 2yrs 


0.07 


0.07 


1.07 








1.08 


E, Unprotect. environm., 3mths 


0.10 


0.06 


1.81 


0.12 


0.04 


2.62 




H, Hospitalisation up to 3mths 


0.09 


0.05 


1.91 


0.12 


0.04 


3.00 






0.09 


0.01 


6.53 


0.10 


0.01 


7.15 




R full = 67 Selected model Y t 




+ E + H Rl x -- 


= 0.67 









This strategy leads to a well-fitting model, unless one of the excluded variables 
has a too large contribution when it is added alone to a set of selected regressors. 
Such a variable would have to be included as an additional regressor. However, this 
did not happen for the given set of data. 

Table 3 Regression results for 



Response: Xg, motoric deficits at 8 years 



starting model selected excluded 



explanatory variables 


coeff 


^'coeff 


Sobs 


coeff 


'^'coeff 


Sobs 


*"obs 


constant 


0.26 




0,26 










Kj, cognitive deficits, 4.5yrs 


-0.01 


0.06 


-0.10 








0.04 


X4, motoric deficits, 4.5yrs 


0.33 


0.04 


7.39 


0.33 


0.04 






Y r , psycho-social risk, 2yrs 


0.01 


0.08 


0.19 








0.43 


X r , biol. -motoric risk, 2yrs 


0.17 


0.08 


2.27 


0.19 


0.06 


2.97 




E, Unprotect. environm., 3mths 


0.01 


0.07 


0.17 








0.44 


H, Hospitalisation up to 3mths 


0.01 


0.08 


0.26 








0.26 




0.18 


0.23 


3.41 


0.05 


0.02 


2.89 




R| u1 , = 0.36 Selected model X 8 


■ Xj+Xr 




= 0.36 











The tests for the residual dependence of the two response components gives a 
weak dependence at age 8 with z bs = 2.4 but a strong dependence at age 4.5 with 
z obs = 7.0. 
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Response: I4, cognitive deficits at 4.5 years 



starting model selected excluded 



explanatory variables 


coeff 


■Scoeff 


Sobs 


coeff 


■Scoeff 


Sobs 


J 

*"obs 


constant 


-0.29 






-0.29 








Y r , psycho-social risk, 2yrs 


0.36 


0.08 


4.81 


0.36 


0.05 


6.77 




X r , biol. -motoric risk, 2yrs 


0.17 


0.09 




0.18 


0.07 






E, Unprotect. environm., 3mths 


-0.01 


0.07 - 


-0.14 








0.39 


H, Hospitalisation up to 3mths 


0.14 


0.04 


3.36 








-0.12 


X 2 


0.14 


0.04 


3.36 


0.14 


0.04 


3.36 




R full = °- 25 - Selected model Y A 


:Y r +X : 


1 K sel 


= 0.25 











Table 5 Regression results for X4 



Response: X4, motoric deficits at 4.5 years 



starting model selected excluded 



explanatory variables 


coeff 


■^coeff 


Sobs 


coeff 


■Scoeff 


Sobs 


J 

Sobs 


constant 


-0.47 






-0.47 








Y r , psycho-social risk, 2yrs 


0.33 


0.10 


3.44 


0.28 


0.07 


4.21 




X r , biol. -motoric risk, 2yrs 


0.62 


0.11 


5.50 


0.50 


0.09 






E, Unprotect. environm., 3mths 


-0.06 


0.08 


-0.66 








-0.77 


H, Hospitalisation up to 3mths 


-0.13 


0.07 


-1.83 








-1.88 




0.21 


0.05 


3.97 


0.23 


0.05 


4.43 




R m = 031 Selected model X4 


:Y r + X 2 


R sel 


= 0.36 











Table 6 Regression results for Y r 



Response: Y r , psycho-social risk 


up to 2 years 










explanatory variables 


starting model 


selected 




excluded 

j 

■^obs 


COeff .Vcoeff 


Sobs 


COeff J' C oeff 


Sobs 


constant 


-0.20 - 




-0.21 - 






X r , biol. -motoric risk, 2yrs 


-0.04 0.04 


-0.81 






-1.51 


E, Unprotect. environm., 3mths 


0.57 0.03 




0.55 0.03 






H, Hospitalisation up to 3mths 


-0.03 0.04 


-0.80 






-1.50 


E 2 


0.16 0.03 


6.12 


0.16 0.03 


6.20 





R? ,, = 0.57 Selected model Y r : E 2 R 2 , = 0.56 
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Table 7 Regression results for X r 
Response: X,-, biologic-motoric risk up to 2 years 



starting model selected excluded 



explanatory variables 


coeff 


^coeff 


Sobs 


coeff 


^coeff 


Sobs 


^obs 


constant 


0.25 






0.22 








Y r , psycho-social risk, 2yrs 


-0.05 


0.07 


-0.81 








-1.22 


E, Unprotect. environm., 3mths 


0.17 


0.06 


3.04 


0.12 


0.04 






H, Hospitalisation up to 3mths 


0.48 


0.04 


12.30 


0.48 


0.04 


12.40 




E 2 


-0.04 


0.03 


-1.09 








-1.42 


R ful\ = 35 Selected model X r 


:E + H 


R sel = 


0.35 











A global goodness-of-fit test, with proper estimates under the full model, may 
depend on additional distributional assumptions and require iterative fitting proce- 
dures. For exclusively linear relations of a joint Gaussian distribution, such a global 
test for the joint regressions would be equivalent to the fitting of a corresponding 
structural equation model, given the unconstrained background variables, and the 
global fitting of the concentration graph model to the context variables would corre- 
spond to estimation and testing for one of Dempster's covariance selection models. 



4.3 Using a well-fitting graph 

There are direct and indirect pathways from risks at three months to cognitive 
deficits at 8 years. The exclusively positive conditional dependences along different 
paths accumulate to positive marginal dependences, even for responses connected 
only indirectly to a risk, for instance for Yg, to Y r or to E. 

Among the background variables, an unprotective environment for the 3 months- 
old child, E, is strongly related to the psycho-social risk up to 2 years, Y r and hos- 
pitalization up to 3 months, H, to the biological-motoric risk up to 2 years, X r . The 
weakest but still statistically significant dependence among these four risks occurs 
for an unprotective environment, E, and the biological-motoric risk, X r . 

Such a dependence taken alone can often best be explained by an underlying 
common explanatory variable, here for instance a genetic or a socio-economic risk. 
This would lead to replacing the full line for (E,X r ) in Figure [2] by the common- 
source V, shown in Figure [3] The inner node of this V is crossed out because it 
represents a hidden that is unobserved variable. Hidden nodes represent variables 
that are unmeasured in a given study but whose relevance and existence is known or 
assumed. 
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Though Figure [3] appears to contain only a small change compared to Figure [2] 
this change requires a Markov equivalence result for a larger class than regression 
graphs, as available for the ribbon-less graphs of Sadeghi (2012a), since a path of 

the type i oh — k does not occur in a regression graph. Given these results, it 

follows that graphs Figure |4{ a) and (b) are Markov equivalent and that the structure 
of graphHtb) can be generated by the larger graphic) that includes a common, but 
hidden regressor node for the two inner nodes of the path. 




Fig. 4 A hidden variable graph (c) generating two Markov equivalent graphs (a) and (b) 



To better understand the distinguishing features of the pathways of dependence in 
Figure pleading to the joint responses of main interest at age 8, we generate the im- 
plied regressions graphs when the assessments at age 8 and at 4.5 years are available 
for only one of the two aspects. In that case one has ignored, that is marginalized 
over, the assessments of the other aspect at age 8 and 4.5. 

The resulting graph, for Y% and Y4 ignored, happens to coincide with the subgraph 
induced by the remaining, selected six nodes in FigureQ] as shown in Figure|5] Such 
an induced graph has the selected nodes and as edges all those present among them 
in the starting graph and no more. 



Yr E, Unprotective 



X8, Motoric 
deficits, 8yrs 




Fig. 5 The regression graph induced by ignoring Fg and F4 in Figure[2] M = {Y$ , F4}, C = 
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The graph of Figure [5]implies that possible psycho-social risks of a child up to 
age 2, Y r , do not contribute directly to predicting motoric deficits at school-age, Xg, 
also when the more recent information on cognitive deficits is not available. 

By contrast, the regression graph in Figure |6]that results after ignoring X% and 
X4, shows two additional arrows compared to the subgraph induced in Figure |2]by 



Fig. 6 The regression graph induced by ignoring Xg and X4 in Figure [2] M = {Xg,Xt}, C = 

The induced arrows are for (Y$,Y r ) and for (Y&,X r ). The graph suggests that cog- 
nitive deficits at school-age, Yg, are directly dependent on all of the remaining vari- 
ables when the more recent information on the motoric risks are unrecorded. There 
are direct and indirect pathways from H and from E to Yg. They involve nonlinear 
dependences of cognitive deficits on previous motoric deficits or risks. These are 
recognized in the fitted equations but not directly in the graph alone. 

What the graph also cannot show is that with X$,X4 unrecorded, the early risks, 
Y r ,H are less important as predictors when Y4,X r ,X?,E are available as regressors of 
Yg. This effect is due to the strong partial dependences of Y r ,E 2 given E,X r ,H and of 
X r ,H given E 7 E 2 7 Y r . Such implications, due to the special parametric constellations 
are not reflected in the graph alone. 

Many more conclusions may be drawn by using just graphs like in Figures [2]to 
[6] The substantive research questions and the special conditions of a given study 
are important; for some different types of study analyzed with graphical Markov 
models see, for instance, Klein, Keiding and Kreiner (1995), Gather, Imhoff and 
Fried (2002), Hardt et al (2004), Wermufh, Marchetti and Byrnes (2012). 

One major attraction of sequences of regressions in joint responses is that they 
may model longitudinal data from observational as well as from intervention stud- 
ies. For instance, with fully randomized allocation of persons to a treatment, all ar- 
rows that may point to the treatment in an observational study, are removed from the 
regression graph. This removal reflects such a successful randomization: indepen- 
dence is assured for the treatment variable of all regressors or background variables, 
no matter whether they are observed or hidden. 



Yi,Y A ,Y r ,X r ,E,H. 




Xr 



H, Hospitalized 
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The paper combines two main themes. One is the notion of traceable regressions. 
These are sequences of joint response regressions together with a set of background 
variables for which an associated regression graph not only captures an indepen- 
dence structure but permits the tracing of pathways of dependence. Study of such 
structures has both a long history and at the same time is the focus for much current 
development. 

Joint resposes are needed when causes or risk factors are expected to affect sev- 
eral responses simultaneously. Such situations occur frequently and cannot be ad- 
equately modeled with distributions generated over directed acyclic graph or such 
a graph with added dashed lines between responses and variables in their past to 
permit unmeasured confounders or endogenous responses. 

A regression graph shows, in particular, conditional independences by missing 
edges and conditional dependences by edges present. The independences simplify 
the underlying data-generating process and emphasize the important dependences 
via the remaining edges. The dependences form the basis for interpretation, for the 
planning of or comparison with further studies and for possible policy action. Prop- 
agation of independencies is now reasonably well understood. There is scope for 
complementary further study that focuses on pathways of dependence. 

The second theme concerns specific applications. Among the important issues 
here are an appropriate definition of population under study, especially when rela- 
tively rare events and conditions are to be investigated, appropriate sampling strate- 
gies, and the importance of building an understanding on step-by-step local anal- 
yses. The data of the Mannheim study happen to satisfy all properties needed for 
tracing pathways of dependence. This permits discussion of the advantages and lim- 
itations for some illustrated path tracings. 

In the near future, more results on estimation and goodness of fit tests are to 
be expected, for instance by extending the fitting procedures for regression graph 
models of Marchetti and Lupparelli (2010) to mixtures of discrete and continuous 
variables, more results on the identification of models that include hidden variables 
such as those by Stanghellini and Vantaggi (2012) and those by Foygel, Draisma 
and Drton (2012), and further evaluations of properties of different types of param- 
eters; see Xie, Ma and Geng (2008) for an excellent starting discussion. 
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